90 research outputs found

    Chaining Test Cases for Reactive System Testing (extended version)

    Full text link
    Testing of synchronous reactive systems is challenging because long input sequences are often needed to drive them into a state at which a desired feature can be tested. This is particularly problematic in on-target testing, where a system is tested in its real-life application environment and the time required for resetting is high. This paper presents an approach to discovering a test case chain---a single software execution that covers a group of test goals and minimises overall test execution time. Our technique targets the scenario in which test goals for the requirements are given as safety properties. We give conditions for the existence and minimality of a single test case chain and minimise the number of test chains if a single test chain is infeasible. We report experimental results with a prototype tool for C code generated from Simulink models and compare it to state-of-the-art test suite generators.Comment: extended version of paper published at ICTSS'1

    Learning Concise Models from Long Execution Traces

    Full text link
    Abstract models of system-level behaviour have applications in design exploration, analysis, testing and verification. We describe a new algorithm for automatically extracting useful models, as automata, from execution traces of a HW/SW system driven by software exercising a use-case of interest. Our algorithm leverages modern program synthesis techniques to generate predicates on automaton edges, succinctly describing system behaviour. It employs trace segmentation to tackle complexity for long traces. We learn concise models capturing transaction-level, system-wide behaviour--experimentally demonstrating the approach using traces from a variety of sources, including the x86 QEMU virtual platform and the Real-Time Linux kernel

    Unbounded safety verification for hardware using software analyzers

    Get PDF
    Demand for scalable hardware verification is ever-increasing. We propose an unbounded safety verification framework for hardware, at the heart of which is a software verifier. To this end, we synthesize Verilog at register transfer level into a software-netlist, represented as a word-level ANSI-C program. The proposed tool flow allows us to leverage the precision and scalability of state-of-the-art software verification techniques. In particular, we evaluate unbounded proof techniques, such as predicate abstraction, k-induction, interpolation, and IC3/PDR; and we compare the performance of verification tools from the hardware and software domains that use these techniques. To the best of our knowledge, this is the first attempt to perform unbounded verification of hardware using software analyzers

    Hardware/Software Co-verification Using Path-based Symbolic Execution

    Get PDF
    Conventional tools for formal hardware/software co-verification use bounded model checking techniques to construct a single monolithic propositional formula. Formulas generated in this way are extremely complex and contain a great deal of irrelevant logic, hence are difficult to solve even by the state-of-the-art Satisfiability (SAT) solvers. In a typical hardware/software co-design the firmware only exercises a fraction of the hardware state-space, and we can use this observation to generate simpler and more concise formulas. In this paper, we present a novel verification algorithm for hardware/software co-designs that identify partitions of the firmware and the hardware logic pertaining to the feasible execution paths by means of path-based symbolic simulation with custom path-pruning, propertyguided slicing and incremental SAT solving. We have implemented this approach in our tool COVERIF. We have experimentally compared COVERIF with HW-CBMC, a monolithic BMC based co-verification tool, and observed an average speed-up of 5× over HW-CBMC for proving safety properties as well as detecting critical co-design bugs in an open-source Universal Asynchronous Receiver Transmitter design and a large SoC design

    DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning

    Full text link
    This paper proposes DeepSynth, a method for effective training of deep Reinforcement Learning (RL) agents when the reward is sparse and non-Markovian, but at the same time progress towards the reward requires achieving an unknown sequence of high-level objectives. Our method employs a novel algorithm for synthesis of compact automata to uncover this sequential structure automatically. We synthesise a human-interpretable automaton from trace data collected by exploring the environment. The state space of the environment is then enriched with the synthesised automaton so that the generation of a control policy by deep RL is guided by the discovered structure encoded in the automaton. The proposed approach is able to cope with both high-dimensional, low-level features and unknown sparse non-Markovian rewards. We have evaluated DeepSynth's performance in a set of experiments that includes the Atari game Montezuma's Revenge. Compared to existing approaches, we obtain a reduction of two orders of magnitude in the number of iterations required for policy synthesis, and also a significant improvement in scalability.Comment: Extended version of AAAI 2021 pape

    Recalibrating classifiers for interpretable abusive content detection

    Get PDF
    Dataset and code for the paper, 'Recalibrating classifiers for interpretable abusive content detection' by Vidgen et al. (2020) -- to appear at the NLP + CSS workshop at EMNLP 2020. We provide: 1,000 annotated tweets, sampled using the Davidson classifier with 20 0.05 increments (50 from each) from a dataset of tweets directed against MPs in the UK 2017 General Election 1,000 annotated tweets, sampled using the Perspective classifier with 20 0.05 increments (50 from each) from a dataset of tweets directed against MPs in the UK 2017 General Election Code for recalibration in R and STAN. Annotation guidelines for both datasets. Paper abstract We investigate the use of machine learning classifiers for detecting online abuse in empirical research. We show that uncalibrated classifiers (i.e. where the 'raw' scores are used) align poorly with human evaluations. This limits their use to understand the dynamics, patterns and prevalence of online abuse. We examine two widely used classifiers (created by Perspective and Davidson et al.) on a dataset of tweets directed against candidates in the UK's 2017 general election. A Bayesian approach is presented to recalibrate the raw scores from the classifiers, using probabilistic programming and newly annotated data. We argue that interpretability evaluation and recalibration is integral to the application of abusive content classifiers
    corecore